Demonstration of R Juneja mapping scripts
First things first, load the functions into your environment.
# Load the mapping functions
source("MappingFunctions.R")
ls()[1] "MapDF" "MapDF_FST" "MapVCFLoci"
There should now be three functions in your environment, MapDF, MapDF_FST and MapVCFLoci. Each one uses the same algorithm to map a set of markers that have been mapped to the current Ae. aegypti genome (AaegL1/2/3) and converts them into Juneja (http://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0002652) assembly physically mapped markers where possible. You will also need to make sure the file JunejaGeneticAssembly.csv is in your working directory.
Mapping from a VCF file
If you have a VCF files with markers mapped to the AaegL genome, you just need to provide the path to where the VCF is and it will return a dataframe with Juneja mapped markers. This function requires you have the package VariantAnnotation from Bioconductor installed. It should install this automatically but do let me know if it doesn’t work. Let’s use the VCF file published in Gordana’s Rio paper (http://onlinelibrary.wiley.com/doi/10.1111/eva.12301/full) as an example
BrazilMappedLoci <- MapVCFLoci("Rasic2015Rio.vcf")Total Loci to be mapped: 13193
Total assigned to chromosomes: 9247
Total mapped to physical positions: 8677
Total assigned to misassembled contigs: 1223
Total unknown is 3946
Total assigned to chrom 1: 1955 | chromosome only: 94
Total assigned to chrom 2: 3989 | chromosome only: 152
Total assigned to chrom 3: 3303 | chromosome only: 324
Time taken: 8.560978 seconds.
head(BrazilMappedLoci) CHROM POS ID JunSC JunChr JunBP JuncM
1 supercont1.1 148364 33405 sc1.1b 1 123508582 24.396
2 supercont1.1 148385 33405 sc1.1b 1 123508603 24.396
3 supercont1.1 148433 33405 sc1.1b 1 123508651 24.396
4 supercont1.1 1186392 33369 <NA> NA NA NA
5 supercont1.1 1186412 33369 <NA> NA NA NA
6 supercont1.1 1186620 33370 <NA> NA NA NA
Too easy.
Mapping from a data frame
The next function is handy if you have a table type output that has the supercontig identifiers and the basepair positions within them. I created an output like this by using the following command with vcftools: vcftools --vcf Rasic2015Rio.vcf --weir-fst-pop TB.txt --weir-fst-pop UR.txt --out TBvsUR
TBvsURFST <- read.delim("TBvsUR.weir.fst")
head(TBvsURFST) CHROM POS WEIR_AND_COCKERHAM_FST
1 supercont1.1 148364 NaN
2 supercont1.1 148385 -0.02347510
3 supercont1.1 148433 NaN
4 supercont1.1 1186392 -0.00523313
5 supercont1.1 1186412 -0.02090100
6 supercont1.1 1186620 0.02145190
# I usually rename the last column as it is a bit unwieldy...
colnames(TBvsURFST) <- c("CHROM", "POS", "FST")
TBvsURFST <- MapDF(TBvsURFST)Total Loci to be mapped: 13193
Total assigned to chromosomes: 9247
Total mapped to physical positions: 8677
Total assigned to misassembled contigs: 1223
Total unknown is 3946
Total physically assigned to chrom 1: 1955 | chromosome only: 94
Total physically assigned to chrom 2: 3989 | chromosome only: 152
Total physically assigned to chrom 3: 3303 | chromosome only: 324
Time taken: 19.85283 seconds.
head(TBvsURFST) CHROM POS FST JunChr JunBP JunSC JuncM
1 supercont1.1 148364 NaN 1 123508582 sc1.1a 24.396
2 supercont1.1 148385 -0.02347510 1 123508603 sc1.1a 24.396
3 supercont1.1 148433 NaN 1 123508651 sc1.1a 24.396
4 supercont1.1 1186392 -0.00523313 NA NA <NA> NA
5 supercont1.1 1186412 -0.02090100 NA NA <NA> NA
6 supercont1.1 1186620 0.02145190 NA NA <NA> NA
# To get just the mapped positions
TBvsURFST <- TBvsURFST[complete.cases(TBvsURFST), ]Cool so what fun things can we do with the mapped data frame? Well I like making pretty interactive plots with a package called manhattanly.
# if you haven't installed it yet uncomment the next line and run it
# install.packages('manhattanly')
library(manhattanly)
library(plotly)
# Get some pretty colours from RColorBrewer, if you don't have this package
# install.packages('RColorBrewer')
library(RColorBrewer)
colours <- brewer.pal(4, "Dark2")
m <- manhattanr(x = TBvsURFST, chr = "JunChr", bp = "JunBP", p = "FST", logp = FALSE,
annotation1 = "JunSC", annotation2 = "JunBP")
manhattanly(m, genomewideline = F, col = c(colours[1], colours[4], colours[3]),
ylab = "FST", ylim = c(-0.05, 1), suggestiveline = mean(TBvsURFST$FST, na.rm = T),
suggestiveline_color = "yellow", title = "") %>% layout(yaxis = list(range = c(-0.06,
1)))Mapping when you still want to visualise the unassigned contigs
So now you’re probably thinking, ok, that’s pretty and all but we chucked out >5000 markers, what if I want to see them on my visualisation? That’s where the third mapping function comes in. This one creates some false chromosomes so that we can see all of the markers. The function maps to chromosomes 1, 2 and 3 as before, but maps markers mapped to contigs that have been assigned to chromosomes but haven’t had their exact locations mapped to chromosomes 4, 5, and 6, and all other markers are assigned to chromosome 7.
# Let's relaod the dataframe so it doesn't get too messy
TBvsURFST_UN <- read.delim("TBvsUR.weir.fst")
colnames(TBvsURFST_UN) <- c("CHROM", "POS", "FST")
TBvsURFST_UN <- MapDF_FST(TBvsURFST_UN)Total Loci to be mapped: 13193
Total assigned: 11332
Total assigned to misassembled contigs: 1223
Total unknown is 0
Total assigned to chrom 1: 1861
Total assigned to chrom 2: 3837
Total assigned to chrom 3: 2979
Time taken: 18.27814 seconds.
head(TBvsURFST_UN) CHROM POS FST JunChr JunBP JunSC JuncM
1 supercont1.1 148364 NaN 1 123508582 sc1.1a 24.396
2 supercont1.1 148385 -0.02347510 1 123508603 sc1.1a 24.396
3 supercont1.1 148433 NaN 1 123508651 sc1.1a 24.396
4 supercont1.1 1186392 -0.00523313 7 2186392 supercont1.1 NA
5 supercont1.1 1186412 -0.02090100 7 3372804 supercont1.1 NA
6 supercont1.1 1186620 0.02145190 7 4559424 supercont1.1 NA
m <- manhattanr(x = TBvsURFST_UN, chr = "JunChr", bp = "JunBP", p = "FST", logp = FALSE,
annotation1 = "JunSC", annotation2 = "JunBP")
manhattanly(m, genomewideline = F, col = c(colours[1], colours[4], colours[3],
colours[1], colours[4], colours[3], colours[2]), ylab = "FST", labelChr = c(1:3,
"U1", "U2", "U3", "Unassigned"), ylim = c(-0.05, 1), suggestiveline = mean(TBvsURFST_UN$FST,
na.rm = T), suggestiveline_color = "yellow", title = "") %>% layout(yaxis = list(range = c(-0.06,
1)))If there are a lot of unassigned it can take over the graph a bit but the good thing about interactivity is that you can zoom in and out on areas you’re interested in. Static versions of these graphs can be created using the package qqman.
Hopefully these functions will make your life easier when trying to investigate physically mapped Ae. aegypti supercontigs on the Juneja 2014 assembly.
–